## R version 4.4.0 (2024-04-24 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 10 x64 (build 19045)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=English_United States.utf8
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: Asia/Bangkok
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] tidyr_1.3.1 magrittr_2.0.3 ape_5.8
## [4] cluster_2.1.6 plotrix_3.8-4 scales_1.3.0
## [7] plyr_1.8.9 RColorBrewer_1.1-3 extrafont_0.19
## [10] MALDIquantForeign_0.14.1 MALDIquant_1.22.2
##
## loaded via a namespace (and not attached):
## [1] jsonlite_1.8.8 dplyr_1.1.4 compiler_4.4.0
## [4] tidyselect_1.2.1 Rcpp_1.0.12 parallel_4.4.0
## [7] jquerylib_0.1.4 yaml_2.3.8 fastmap_1.2.0
## [10] lattice_0.22-6 R6_2.5.1 generics_0.1.3
## [13] knitr_1.47 XML_3.99-0.16.1 tibble_3.2.1
## [16] munsell_0.5.1 readBrukerFlexData_1.9.2 pillar_1.9.0
## [19] bslib_0.7.0 rlang_1.1.4 utf8_1.2.4
## [22] cachem_1.1.0 Rttf2pt1_1.3.12 xfun_0.45
## [25] sass_0.4.9 cli_3.6.2 digest_0.6.35
## [28] grid_4.4.0 rstudioapi_0.16.0 base64enc_0.1-3
## [31] lifecycle_1.0.4 nlme_3.1-165 vctrs_0.6.5
## [34] readMzXmlData_2.8.3 evaluate_0.24.0 glue_1.7.0
## [37] extrafontdb_1.0 fansi_1.0.6 colorspace_2.1-0
## [40] purrr_1.0.2 rmarkdown_2.27 pkgconfig_2.0.3
## [43] tools_4.4.0 htmltools_0.5.8.1
Figure 1. Principle of the cross-correlation algorithm. (A) A typical raw MALDI-TOF mass spectrum of An. minimus, (B) the corresponding processed spectrum after intensity normalization and baseline removal, (C) comparison of the processed mass spectra of two An. minimus specimens over the 5000-5500 kDa mass interval showing high similarity between spectra, (D) comparison of the processed mass spectra of an An. minimus specimen and of an An. maculatus specimen over the 5000-5500 kDa mass interval showing limited similarity between spectra, (E) the cross-correlation function of the two An. minimus spectra over the 5000-5500 kDa mass interval gives a local maximum of 0.982 and (F) the cross-correlation function of the An. minimus and An. maculatus spectra over the 5000-5500 kDa mass interval gives a local maximum of 0.540. If no local maximum of the cross-correlation function is detected, the algorithm is parameterized to return 0. The resulting cross-correlation index on the log scale (log10CCI) over the 3000-12000 kDa mass range is -4.9 for the two An. minimus spectra and -Infinite for the An. minimus and An. maculatus spectra.
## [1] "Entomological surveys were carried out in 33 villages between 2020-11-09 and 2022-10-04"
## [1] "Mosquito samples were recieved in the laboratory 1 to 10 days after collection"
## [1] "Mosquito samples in the reference panel were processed in MALDI-TOF MS after 85 to 422 days of storage at -80°C"
## [1] "Mosquito samples in the test panel were processed in MALDI-TOF MS after 392 to 1069 days of storage at -20°C"
## [1] "403 Anopheles specimens were selected for inclusion in either the reference or the test panel (270 and 133 specimens, respectively)"
## [1] "254 specimens of the reference panel could be identified with PCR (ITS2: 32; COI: 24; ITS2+COI: 198)"
## [1] "105 specimens of the test panel could be identified with PCR (ITS2: 105; COI: 0; ITS2+COI: 0)"
## [1] "In total, 359 PCR-identified specimens were asigned to 26 taxa including 21 sensu stricto species and 5 sibling species pairs or complexes"
## [1] "An. aconitus s.l." "An. annularis s.l."
## [3] "An. baimaii" "An. campestris/wejchoochotei"
## [5] "An. culicifacies s.l." "An. dirus"
## [7] "An. dissidens" "An. dravidicus"
## [9] "An. interruptus" "An. jamesii"
## [11] "An. jeyporiensis" "An. karwari"
## [13] "An. kochi" "An. maculatus"
## [15] "An. minimus" "An. nivipes"
## [17] "An. peditaeniatus" "An. philippinensis"
## [19] "An. pseudowillmori" "An. saeungae"
## [21] "An. sawadwongporni" "An. sinensis"
## [23] "An. splendidus" "An. tessellatus s.l."
## [25] "An. vagus" "An. varuna"
## [1] "2 specimens identified with COI-PCR as An. baimaii/dirus were excluded from the panel because other specimens of An. baimai and An. dirus identified with ITS2-PCR were available"
## [1] "4 specimens identified by morphology as Bariborstris Group were removed from the panel despite amplification of a ITS2 because the clean portion of the sequence was too short"
| subgenus | group | species | reference | test | total |
|---|---|---|---|---|---|
| Anopheles | Asiaticus | An. interruptus | 1 | 1 | 2 |
| Anopheles | Barbirostris | An. campestris/wejchoochotei | 1 | 0 | 1 |
| Anopheles | Barbirostris | An. dissidens | 24 | 0 | 24 |
| Anopheles | Barbirostris | An. saeungae | 4 | 0 | 4 |
| Anopheles | Hyrcanus | An. peditaeniatus | 3 | 2 | 5 |
| Anopheles | Hyrcanus | An. sinensis | 17 | 2 | 19 |
| Cellia | Annularis | An. annularis s.l. | 10 | 6 | 16 |
| Cellia | Annularis | An. nivipes | 13 | 3 | 16 |
| Cellia | Annularis | An. philippinensis | 4 | 1 | 5 |
| Cellia | Funestus | An. aconitus s.l. | 2 | 6 | 8 |
| Cellia | Funestus | An. culicifacies s.l. | 14 | 5 | 19 |
| Cellia | Funestus | An. jeyporiensis | 6 | 9 | 15 |
| Cellia | Funestus | An. minimus | 40 | 11 | 51 |
| Cellia | Funestus | An. varuna | 2 | 0 | 2 |
| Cellia | Jamesii | An. jamesii | 15 | 2 | 17 |
| Cellia | Jamesii | An. splendidus | 11 | 7 | 18 |
| Cellia | Kochi | An. kochi | 24 | 7 | 31 |
| Cellia | Leucosphyrus | An. baimaii | 12 | 6 | 18 |
| Cellia | Leucosphyrus | An. dirus | 0 | 2 | 2 |
| Cellia | Maculatus | An. dravidicus | 3 | 2 | 5 |
| Cellia | Maculatus | An. maculatus | 16 | 3 | 19 |
| Cellia | Maculatus | An. pseudowillmori | 10 | 3 | 13 |
| Cellia | Maculatus | An. sawadwongporni | 5 | 2 | 7 |
| Cellia | Subpictus | An. vagus | 13 | 8 | 21 |
| Cellia | Tessellatus | An. tessellatus s.l. | 2 | 9 | 11 |
| Cellia | Unclassified | An. karwari | 2 | 8 | 10 |
## [1] "2535 mass spectra of the 254 reference Anopheles specimens identified with PCR were acquired, yielding 3211845 pairwise comparisons of distinct spectra pairs"
## [1] "The median log10CCI was -7.9 (IQR: -9.2 to -6.8) for comparisons of technical replicates of the same specimen."
## [1] "The median log10CCI was -10.7 (IQR: -12.6 to -9.4) for comparisons of technical replicates of the same specimen."
## [1] "The median log10CCI was -Inf (IQR: -Inf to -Inf) for comparisons of technical replicates of the same specimen."
Figure 3. Repeatability, reproducibility and specificity of the
mass spectra. (A) median log10CCI of pairwise comparisons
between technical replicates of the same specimen collated by mass
spectrum and (B) corresponding density function, (C) median log10CCI of
pairwise comparisons between spectra of different specimens of the same
species collated by mass spectrum and (D) corresponding density
function, (E) median log10CCI of pairwise comparisons between spectra of
different species collated by mass spectrum and (F) corresponding
density function. Spectra with low intra-specimen (median log10CCI <
12, 39 spectra) and inter-specimen reproducibility (median log10CCI <
14, 115 spectra) are shown in orange in the panel A and C,
respectively.
Figure 4. Heat map grid of the median cross-correlation index collated by specimen included in the reference mass spectra database. Red color on the diagonal shows the high reproducibility of mass spectra and the blue color out of the diagonal shows the high specificity of mass spectra. Orange color out of the central diagonal shows the high similarity between sibling species of the Barbirostris complex and some species of the Neomyzomyia series. Negative infinite values are showed in white.
Figure S1. Dendrogram showing the output of hierarchical
clustering analysis.
## [1] "2477/2535 (97.7%) of the spectra included in the reference database matched with the same species (median log10CCI: -7.8; IQR: -8.8 to -7.0)"
## [1] "58/2535 (2.3%) of the spectra included in the reference database matched with another species"
## [1] "Among the mismatches, 19 were spectra of species represented by only one specimen and thus not included in the queried dataset because self-matching was disabled (median log10CCI: -13.4; IQR: -14.1 to -9.6), and 39 were true cross-matches between two referenced species (median log10CCI: -11.7; IQR: -13.6 to -9.8)."
Figure 5. Distribution of the maximum log10CCI value collated by spectra in bank-to-bank comparison by category of result, excluding comparisons between technical replicates of the same sample. (A) Correct matches with another specimen of the same species, (B) true cross-matches between a species referenced in the queried database and a different species, (C) species represented by only one specimen in the reference mass spectra database and therefore not included in the queried database because self-match was disabled.
| species.in | species.out | n | median | q25 | q75 |
|---|---|---|---|---|---|
| An. dissidens | An. saeungae | 2 | -8.029531 | -8.566749 | -7.492313 |
| An. dravidicus | An. maculatus | 4 | -9.730305 | -9.995887 | -9.570995 |
| An. dravidicus | An. minimus | 1 | -10.019018 | -10.019018 | -10.019018 |
| An. dravidicus | An. nivipes | 1 | -10.218182 | -10.218182 | -10.218182 |
| An. dravidicus | An. splendidus | 1 | -9.876309 | -9.876309 | -9.876309 |
| An. karwari | An. dissidens | 1 | -11.807625 | -11.807625 | -11.807625 |
| An. karwari | An. jamesii | 1 | -9.879874 | -9.879874 | -9.879874 |
| An. kochi | An. pseudowillmori | 1 | -16.999292 | -16.999292 | -16.999292 |
| An. pseudowillmori | An. maculatus | 2 | -11.817782 | -11.861658 | -11.773905 |
| An. saeungae | An. dissidens | 5 | -7.054183 | -7.469164 | -7.024720 |
| An. sawadwongporni | An. maculatus | 8 | -13.762174 | -13.998237 | -13.640210 |
| An. sawadwongporni | An. nivipes | 2 | -14.169721 | -14.328012 | -14.011430 |
| An. tessellatus s.l. | An. kochi | 10 | -12.176384 | -13.076071 | -10.951927 |
| spectrum.in | spectrum.out | sample.in | sample.out | species.in | species.out | cci_log | |
|---|---|---|---|---|---|---|---|
| 207522 | 202201060900_1K1 | 202103050900_2I2 | METF2_0031708 | METF2_0006000 | An. campestris/wejchoochotei | An. dissidens | -9.150524 |
| 210058 | 202201060900_1K2 | 202103050900_2I3 | METF2_0031708 | METF2_0006000 | An. campestris/wejchoochotei | An. dissidens | -9.547698 |
| 204989 | 202201060900_1K3 | 202103050900_2I1 | METF2_0031708 | METF2_0006000 | An. campestris/wejchoochotei | An. dissidens | -10.236264 |
| 220200 | 202201060900_1K4 | 202103050900_2J3 | METF2_0031708 | METF2_0006000 | An. campestris/wejchoochotei | An. dissidens | -9.322730 |
| 225271 | 202201060900_1L1 | 202103050900_2K1 | METF2_0031708 | METF2_0006000 | An. campestris/wejchoochotei | An. dissidens | -9.099370 |
| 204993 | 202201060900_1L4 | 202103050900_2I1 | METF2_0031708 | METF2_0006000 | An. campestris/wejchoochotei | An. dissidens | -9.225623 |
| 210064 | 202201060900_2A1 | 202103050900_2I3 | METF2_0031708 | METF2_0006000 | An. campestris/wejchoochotei | An. dissidens | -9.626956 |
| 222740 | 202201060900_2A2 | 202103050900_2J4 | METF2_0031708 | METF2_0006000 | An. campestris/wejchoochotei | An. dissidens | -10.053820 |
| 5574122 | 202201060900_1L3 | 202201060900_2B2 | METF2_0031708 | METF2_0031712 | An. campestris/wejchoochotei | An. saeungae | -10.225270 |
| 3161915 | 202103110900_2F3 | 202103290900_1H2 | METF2_0018441 | METF2_0009082 | An. interruptus | An. minimus | -13.402686 |
| 2414091 | 202103110900_2F4 | 202103190900_3B4 | METF2_0018441 | METF2_0005907 | An. interruptus | An. minimus | -13.853556 |
| 1453327 | 202103110900_2G1 | 202103101200_2B3 | METF2_0018441 | METF2_0013870 | An. interruptus | An. minimus | -16.529050 |
| 548333 | 202103110900_2G2 | 202103080900_2C2 | METF2_0018441 | METF2_0005843 | An. interruptus | An. minimus | -15.613003 |
| 2416629 | 202103110900_2G3 | 202103190900_3C1 | METF2_0018441 | METF2_0005907 | An. interruptus | An. minimus | -13.825802 |
| 959005 | 202103110900_2G4 | 202103090900_1H2 | METF2_0018441 | METF2_0008991 | An. interruptus | An. minimus | -14.449650 |
| 3156851 | 202103110900_2H1 | 202103290900_1G4 | METF2_0018441 | METF2_0009082 | An. interruptus | An. minimus | -14.295210 |
| 2424237 | 202103110900_2H2 | 202103190900_3C4 | METF2_0018441 | METF2_0005907 | An. interruptus | An. minimus | -13.944487 |
| 2416633 | 202103110900_2H3 | 202103190900_3C1 | METF2_0018441 | METF2_0005907 | An. interruptus | An. minimus | -13.601965 |
| 2424239 | 202103110900_2H4 | 202103190900_3C4 | METF2_0018441 | METF2_0005907 | An. interruptus | An. minimus | -14.568332 |
| species.in | species.out | n | median | q25 | q75 |
|---|---|---|---|---|---|
| An. campestris/wejchoochotei | An. dissidens | 8 | -9.435214 | -9.733672 | -9.206849 |
| An. campestris/wejchoochotei | An. saeungae | 1 | -10.225270 | -10.225270 | -10.225270 |
| An. interruptus | An. minimus | 10 | -14.119848 | -14.538661 | -13.832740 |
| species | n.samples | n.spectra | median.log10.cci | iqr.log10.cci |
|---|---|---|---|---|
| An. aconitus s.l. | 2 | 20 | -9.3 | -9.7 to -8.9 |
| An. annularis s.l. | 10 | 100 | -8.8 | -9.2 to -8.3 |
| An. baimaii | 12 | 120 | -7.9 | -8.9 to -6.6 |
| An. culicifacies s.l. | 14 | 140 | -8.0 | -8.6 to -7.5 |
| An. dissidens | 24 | 238 | -7.7 | -8.7 to -7 |
| An. dravidicus | 3 | 23 | -10.3 | -10.8 to -10.1 |
| An. jamesii | 15 | 150 | -7.8 | -8.4 to -7.2 |
| An. jeyporiensis | 6 | 60 | -8.0 | -9.5 to -7.7 |
| An. karwari | 2 | 18 | -10.5 | -10.8 to -10.2 |
| An. kochi | 24 | 237 | -7.7 | -8.5 to -6.6 |
| An. maculatus | 16 | 160 | -8.3 | -9 to -7.4 |
| An. minimus | 40 | 400 | -7.3 | -8.2 to -6.5 |
| An. nivipes | 13 | 129 | -7.1 | -8.2 to -6.2 |
| An. peditaeniatus | 3 | 30 | -9.6 | -10.5 to -9.2 |
| An. philippinensis | 4 | 40 | -8.9 | -9.5 to -8.3 |
| An. pseudowillmori | 10 | 98 | -7.3 | -7.9 to -6.6 |
| An. saeungae | 4 | 35 | -7.5 | -7.9 to -7.2 |
| An. sawadwongporni | 4 | 40 | -9.3 | -10.3 to -8.8 |
| An. sinensis | 17 | 170 | -7.1 | -7.6 to -6.4 |
| An. splendidus | 11 | 109 | -7.5 | -8.3 to -7 |
| An. tessellatus s.l. | 2 | 10 | -11.3 | -11.6 to -11.1 |
| An. vagus | 13 | 130 | -8.0 | -8.6 to -7.3 |
| An. varuna | 2 | 20 | -9.6 | -10.2 to -9.3 |
## [1] "1049 mass spectra of the 105 PCR-identified specimens included in the validation panel were queried against the reference database, yielding 2659215 pairwise comparisons"
Figure 6. Evaluation of the performance of the reference mass spectra database for Anopheles species identification using the test panel. (A) Sensibility and specificity determined at varying identification threshold considering one spot per specimen, (B) corresponding receiving operator characteristics curve, (C) sensibility and specificity determined at varying identification threshold considering four spots per specimen and (D) corresponding receiving operator characteristics curve. The shaded areas in panels A and C show the 95% credible interval around the median estimate of 1000 simulations. The dashed line in panels B and D shows the performance of a random classification.
| threshold | spot | sensitivity2 | specificity2 | ppv2 | accuracy2 | |
|---|---|---|---|---|---|---|
| 109 | -14 | 1 | 0.96 (0.92 to 0.99) | 0.39 (0.32 to 0.46) | 0.94 (0.92 to 0.96) | 0.9 (0.87 to 0.94) |
| 110 | -14 | 2 | 1 (0.98 to 1) | 0.27 (0.22 to 0.31) | 0.94 (0.91 to 0.96) | 0.93 (0.9 to 0.96) |
| 112 | -14 | 4 | 1 (1 to 1) | 0.19 (0.16 to 0.23) | 0.93 (0.91 to 0.95) | 0.93 (0.91 to 0.95) |
| 117 | -14 | 9 | 1 (1 to 1) | 0.14 (0.14 to 0.16) | 0.92 (0.92 to 0.93) | 0.92 (0.92 to 0.93) |
| 127 | -13 | 1 | 0.91 (0.86 to 0.95) | 0.65 (0.58 to 0.71) | 0.95 (0.93 to 0.97) | 0.87 (0.83 to 0.9) |
| 128 | -13 | 2 | 0.96 (0.93 to 0.99) | 0.51 (0.46 to 0.58) | 0.95 (0.92 to 0.97) | 0.91 (0.88 to 0.94) |
| 130 | -13 | 4 | 0.99 (0.97 to 1) | 0.4 (0.36 to 0.45) | 0.93 (0.91 to 0.96) | 0.92 (0.9 to 0.95) |
| 135 | -13 | 9 | 1 (0.99 to 1) | 0.3 (0.3 to 0.32) | 0.92 (0.92 to 0.93) | 0.92 (0.91 to 0.93) |
| 145 | -12 | 1 | 0.82 (0.77 to 0.87) | 0.85 (0.8 to 0.9) | 0.97 (0.94 to 0.99) | 0.81 (0.75 to 0.86) |
| 146 | -12 | 2 | 0.89 (0.86 to 0.92) | 0.78 (0.73 to 0.83) | 0.96 (0.94 to 0.98) | 0.86 (0.83 to 0.89) |
| 148 | -12 | 4 | 0.92 (0.89 to 0.94) | 0.71 (0.68 to 0.76) | 0.95 (0.94 to 0.97) | 0.88 (0.85 to 0.9) |
| 153 | -12 | 9 | 0.95 (0.93 to 0.95) | 0.62 (0.61 to 0.65) | 0.94 (0.94 to 0.95) | 0.9 (0.88 to 0.9) |
| 163 | -11 | 1 | 0.69 (0.63 to 0.75) | 0.96 (0.93 to 0.99) | 1 (0.97 to 1) | 0.7 (0.63 to 0.74) |
| 164 | -11 | 2 | 0.79 (0.75 to 0.82) | 0.93 (0.9 to 0.96) | 0.99 (0.98 to 1) | 0.79 (0.74 to 0.82) |
| 166 | -11 | 4 | 0.84 (0.81 to 0.87) | 0.9 (0.88 to 0.92) | 0.99 (0.98 to 1) | 0.83 (0.8 to 0.87) |
| 171 | -11 | 9 | 0.89 (0.87 to 0.89) | 0.86 (0.86 to 0.88) | 0.98 (0.98 to 0.99) | 0.88 (0.86 to 0.89) |
| 181 | -10 | 1 | 0.48 (0.42 to 0.53) | 1 (1 to 1) | 1 (1 to 1) | 0.49 (0.43 to 0.54) |
| 182 | -10 | 2 | 0.59 (0.54 to 0.64) | 1 (1 to 1) | 1 (1 to 1) | 0.6 (0.55 to 0.65) |
| 184 | -10 | 4 | 0.69 (0.65 to 0.72) | 1 (1 to 1) | 1 (1 to 1) | 0.7 (0.66 to 0.72) |
| 189 | -10 | 9 | 0.75 (0.73 to 0.75) | 1 (1 to 1) | 1 (1 to 1) | 0.75 (0.73 to 0.75) |